생산 환경으로 전환하기: 배포의 사고방식

이 마지막 모듈은 노트북에서 높은 정확도를 달성한 성공적인 연구와 신뢰할 수 있는 실행 사이의 간극을 메웁니다. 배포는 파이토치 모델을 최소한의 자체적으로 작동하는 서비스 에 사용자에게 낮은 지연 시간과 높은 가용성을 갖춘 효율적인 예측 서비스로 변환하는 핵심 과정입니다.

1. 생산 환경에 맞는 사고 방식의 전환

지브러노트북의 탐색형 환경은 생산 환경에서는 상태가 유지되고 취약합니다. 우리는 탐색적 스크립팅에서 시작해 동시 요청 처리, 자원 최적화, 대규모 시스템과의 원활한 통합이 가능한 구조적이고 모듈식 구성 요소로 코드를 재구성해야 합니다.

저지연 예측: 목표 지연 시간(예: $50\text{ms}$) 이하로 일관되게 예측 시간을 확보하는 것. 실시간 응용 프로그램에 필수적입니다.

높은 가용성: 서비스가 신뢰할 수 있고 상태 없이 작동하며 장애 발생 시 빠르게 복구할 수 있도록 설계하는 것.

재현 가능성: 배포된 모델과 환경(의존성, 가중치, 설정)이 검증된 연구 결과와 완전히 일치하도록 보장하는 것.

초점: 모델 서비스

전체 학습 스크립트를 배포하는 대신, 최소한의 자체적으로 작동하는 서비스 래퍼를 배포합니다. 이 서비스는 다음과 같은 세 가지 작업만 처리해야 합니다: 최적화된 모델 아티팩트를 로드하고, 입력 전처리를 적용한 후 순전파를 수행하여 예측 결과를 반환합니다.

TERMINALbash — uvicorn-service

> Ready. Click "Simulate Deployment Flow" to run.

ARTIFACT INSPECTOR Live

Simulate flow to view loaded production artifacts.

Question 1

Which feature of a Jupyter notebook makes it unsuitable for production deployment?

It primarily uses Python code

It is inherently stateful and resource-intensive

It cannot directly access the GPU

Question 2

What is the primary purpose of converting a PyTorch model to TorchScript or ONNX before deployment?

Optimization for faster C++ execution and reduced Python dependency

To prevent model theft or reverse engineering

To automatically handle input data preprocessing

Question 3

When designing a production API, when should the model weights be loaded?

Once, when the service initializes

At the start of every prediction request

When the first request to the service is received

Challenge: Defining the Minimal Service

Plan the structural requirements for a low-latency service.

You need to deploy a complex image classification model ($1\text{GB}$) that requires specialized image preprocessing. It must handle $50$ requests per second.

Step 1

To ensure high throughput and low average latency, what is the single most critical structural change needed for the Python script?

Solution:
Refactor the codebase into isolated modules (Preprocessing, Model Definition, Inference Runner) and ensure the entire process is packaged for containerization.

Step 2

What is the minimum necessary "artifact" to ship, besides the trained weights?

Solution:
The exact code/class definition used for preprocessing and the model architecture definition, serialized and coupled with the weights.